Picture for Shoubin Yu

Shoubin Yu

STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models

Add code
May 25, 2026
Viaarxiv icon

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

Add code
May 11, 2026
Viaarxiv icon

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

Add code
Mar 23, 2026
Viaarxiv icon

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting

Add code
Mar 15, 2026
Viaarxiv icon

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution

Add code
Feb 18, 2026
Viaarxiv icon

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

Add code
Feb 09, 2026
Viaarxiv icon

Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning

Add code
Jul 09, 2025
Viaarxiv icon

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

Add code
Jun 23, 2025
Viaarxiv icon

Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding

Add code
Jun 06, 2025
Viaarxiv icon

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

Add code
Apr 11, 2025
Viaarxiv icon